Goto

Collaborating Authors

 multi-agent game


Conservative Offline Policy Adaptation in Multi-Agent Games

Neural Information Processing Systems

Prior research on policy adaptation in multi-agent games has often relied on online interaction with the target agent in training, which can be expensive and impractical in real-world scenarios. Inspired by recent progress in offline reinforcement learning, this paper studies offline policy adaptation, which aims to utilize the target agent's behavior data to exploit its weakness or enable effective cooperation. We investigate its distinct challenges of distributional shift and risk-free deviation, and propose a novel learning objective, conservative offline adaptation, that optimizes the worst-case performance against any dataset consistent proxy models. We propose an efficient algorithm called Constrained Self-Play (CSP) that incorporates dataset information into regularized policy learning. We prove that CSP learns a near-optimal risk-free offline adaptation policy upon convergence. Empirical results demonstrate that CSP outperforms non-conservative baselines in various environments, including Maze, predator-prey, MuJoCo, and Google Football.


Finding Friend and Foe in Multi-Agent Games

Neural Information Processing Systems

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on The Resistance: Avalon, the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play.


Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

Neural Information Processing Systems

Learning to communicate through interaction, rather than relying on explicit supervision, is often considered a prerequisite for developing a general AI. We study a setting where two agents engage in playing a referential game and, from scratch, develop a communication protocol necessary to succeed in this game. Unlike previous work, we require that messages they exchange, both at train and test time, are in the form of a language (i.e.


Reviews: Finding Friend and Foe in Multi-Agent Games

Neural Information Processing Systems

The paper builds on well-known methods (CFR) and provides novel improvements and modifications that extend the approach to a multiplayer, hidden-role setting. This is original and novel and creative, though the crucial role of CFR cannot be understated. Related work appears to be adequately cited. The empirical results provide the main validation for the soundness and quality of the proposed algorithm; this is reasonable and is explained well in the paper. I have not spotted any obvious illogicalities or mistakes.


Reviews: Finding Friend and Foe in Multi-Agent Games

Neural Information Processing Systems

All reviewers agree that the paper provides some nice contributions (extending CFR beyond 2 players and tackling Avalon) and that the authors succeed well with their rebuttal to address some of the major concerns brought on by some of the referees. They have responded adequately and furthermore open-sourced their implementation. We expect the authors though to carry out the promised changes (and also improve on the notation).


Conservative Offline Policy Adaptation in Multi-Agent Games

Neural Information Processing Systems

Prior research on policy adaptation in multi-agent games has often relied on online interaction with the target agent in training, which can be expensive and impractical in real-world scenarios. Inspired by recent progress in offline reinforcement learn- ing, this paper studies offline policy adaptation, which aims to utilize the target agent's behavior data to exploit its weakness or enable effective cooperation. We investigate its distinct challenges of distributional shift and risk-free deviation, and propose a novel learning objective, conservative offline adaptation, that optimizes the worst-case performance against any dataset consistent proxy models. We pro- pose an efficient algorithm called Constrained Self-Play (CSP) that incorporates dataset information into regularized policy learning. We prove that CSP learns a near-optimal risk-free offline adaptation policy upon convergence.


Finding Friend and Foe in Multi-Agent Games

Neural Information Processing Systems

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on "The Resistance: Avalon", the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play.


Reviews: Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

Neural Information Processing Systems

Increasing my score based on the authors rebuttal. The argument that the proposed method can complement human-bot training makes sense. Also, it seems RL baseline experiments were exhaustive. But the argument about the learnt language being compositional should be toned down since there is not enough evidence to support it. Old reviews: The paper proposes to use Gumbel-softmax for training sender and receiver agents in a referential game like Lazaridou (2016).


Towards Distraction-Robust Active Visual Tracking

Zhong, Fangwei, Sun, Peng, Luo, Wenhan, Yan, Tingyun, Wang, Yizhou

arXiv.org Artificial Intelligence

In active visual tracking, it is notoriously difficult when distracting objects appear, as distractors often mislead the tracker by occluding the target or bringing a confusing appearance. To address this issue, we propose a mixed cooperative-competitive multi-agent game, where a target and multiple distractors form a collaborative team to play against a tracker and make it fail to follow. Through learning in our game, diverse distracting behaviors of the distractors naturally emerge, thereby exposing the tracker's weakness, which helps enhance the distraction-robustness of the tracker. For effective learning, we then present a bunch of practical methods, including a reward function for distractors, a cross-modal teacher-student learning strategy, and a recurrent attention mechanism for the tracker. The experimental results show that our tracker performs desired distraction-robust active visual tracking and can be well generalized to unseen environments. We also show that the multi-agent game can be used to adversarially test the robustness of trackers.


DeepMind Wants to Reimagine One of the Most Important Algorithms in Machine Learning

#artificialintelligence

I recently started an AI-focused educational newsletter, that already has over 80,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Principal component analysis(PCA) is one of the key algorithms that are part of any machine learning curriculum. Initially created in the early 1900s, PCA is a fundamental algorithm to understand data in high-dimensional spaces which are common in deep learning problems.